skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Editors contains: "Ruiz, Francisco"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Ruiz, Francisco and (Ed.)
    Motivated by personalized healthcare and other applications involving sensitive data, we study online exploration in reinforcement learning with differential privacy (DP) constraints. Existing work on this problem established that no-regret learning is possible under joint differential privacy (JDP) and local differential privacy (LDP) but did not provide an algorithm with optimal regret. We close this gap for the JDP case by designing an $$\epsilon$$-JDP algorithm with a regret of $$\widetilde{O}(\sqrt{SAH^2T}+S^2AH^3/\epsilon)$$ which matches the information-theoretic lower bound of non-private learning for all choices of $$\epsilon> S^{1.5}A^{0.5} H^2/\sqrt{T}$$. In the above, $$S$$, $$A$$ denote the number of states and actions, $$H$$ denotes the planning horizon, and $$T$$ is the number of steps. To the best of our knowledge, this is the first private RL algorithm that achieves privacy for free asymptotically as $$T\rightarrow \infty$$. Our techniques — which could be of independent interest — include privately releasing Bernstein-type exploration bonuses and an improved method for releasing visitation statistics. The same techniques also imply a slightly improved regret bound for the LDP case. 
    more » « less
  2. Ruiz, Francisco and (Ed.)
    We consider the problem of universal dynamic regret minimization under exp-concave and smooth losses. We show that appropriately designed Strongly Adaptive algorithms achieve a dynamic regret of $$\tilde O(d^2 n^{1/5} [\mathcal{TV}_1(w_{1:n})]^{2/5} \vee d^2)$$, where $$n$$ is the time horizon and $$\mathcal{TV}_1(w_{1:n})$$ a path variational based on second order differences of the comparator sequence. Such a path variational naturally encodes comparator sequences that are piece-wise linear – a powerful family that tracks a variety of non-stationarity patterns in practice (Kim et al., 2009). The aforementioned dynamic regret is shown to be optimal modulo dimension dependencies and poly-logarithmic factors of $$n$$. To the best of our knowledge, this path variational has not been studied in the non-stochastic online learning literature before. Our proof techniques rely on analysing the KKT conditions of the offline oracle and requires several non-trivial generalizations of the ideas in Baby and Wang (2021) where the latter work only implies an $$\tilde{O}(n^{1/3})$$ regret for the current problem. 
    more » « less
  3. Ruiz, Francisco; Dy, Jennifer; van_de_Meent, Jan-Willem (Ed.)
    We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model. 
    more » « less
  4. Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem (Ed.)
    We consider a task of surveillance-evading path-planning in a continuous setting. An Evader strives to escape from a 2D domain while minimizing the risk of detection (and immediate capture). The probability of detection is path-dependent and determined by the spatially inhomogeneous surveillance intensity, which is fixed but a priori unknown and gradually learned in the multi-episodic setting. We introduce a Bayesian reinforcement learning algorithm that relies on a Gaussian Process regression (to model the surveillance intensity function based on the information from prior episodes), numerical methods for Hamilton-Jacobi PDEs (to plan the best continuous trajectories based on the current model), and Confidence Bounds (to balance the exploration vs exploitation). We use numerical experiments and regret metrics to highlight the significant advantages of our approach compared to traditional graph-based algorithms of reinforcement learning. 
    more » « less
  5. Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem (Ed.)
    In the analysis of data sets consisting of (X, Y)-pairs, a tacit assumption is that each pair corresponds to the same observational unit. If, however, such pairs are obtained via record linkage of two files, this assumption can be violated as a result of mismatch error rooting, for example, in the lack of reliable identifiers in the two files. Recently, there has been a surge of interest in this setting under the term “Shuffled Data” in which the underlying correct pairing of (X, Y)-pairs is represented via an unknown permutation. Explicit modeling of the permutation tends to be associated with overfitting, prompting the need for suitable methods of regularization. In this paper, we propose an exponential family prior on the permutation group for this purpose that can be used to integrate various structures such as sparse and local shuffling. This prior turns out to be conjugate for canonical shuffled data problems in which the likelihood conditional on a fixed permutation can be expressed as product over the corresponding (X,Y)-pairs. Inference can be based on the EM algorithm in which the E-step is approximated by sampling, e.g., via the Fisher-Yates algorithm. The M-step is shown to admit a reduction from n^2 to n terms if the likelihood of (X,Y)-pairs has exponential family form. Comparisons on synthetic and real data show that the proposed approach compares favorably to competing methods. 
    more » « less
  6. Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem (Ed.)
    The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well. This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models. We derive bounds using both a natural exponential-reciprocal decomposition of the softmax as well as an alternative decomposition in terms of the log-sum-exp function. The new bounds are provably and/or numerically tighter than linear bounds obtained in previous work on robustness verification of transformers. As illustrations of the utility of the bounds, we apply them to verification of transformers as well as of the robustness of predictive uncertainty estimates of deep ensembles. 
    more » « less
  7. Ruiz, Francisco; Dy, Jennife; van de Meent, Jan-Willem (Ed.)
    There are synergies of research interests and industrial efforts in modeling fairness and correcting algorithmic bias in machine learning. In this paper, we present a scalable algorithm for spectral clustering (SC) with group fairness constraints. Group fairness is also known as statistical parity where in each cluster, each protected group is represented with the same proportion as in the entirety. While FairSC algorithm (Kleindessner et al., 2019) is able to find the fairer clustering, it is compromised by high computational costs due to the algorithm’s kernels of computing nullspaces and the square roots of dense matrices explicitly. We present a new formulation of the underlying spectral computation of FairSC by incorporating nullspace projection and Hotelling’s deflation such that the resulting algorithm, called s-FairSC, only involves the sparse matrix-vector products and is able to fully exploit the sparsity of the fair SC model. The experimental results on the modified stochastic block model demonstrate that while it is comparable with FairSC in recovering fair clustering, s-FairSC is 12× faster than FairSC for moderate model sizes. s-FairSC is further demonstrated to be scalable in the sense that the computational costs of s-FairSC only increase marginally compared to the SC without fairness constraints. 
    more » « less
  8. Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem (Ed.)
    We study discrete distribution estimation under user-level local differential privacy (LDP). In user-level $$\varepsilon$$-LDP, each user has $$m\ge1$$ samples and the privacy of all $$m$$ samples must be preserved simultaneously. We resolve the following dilemma: While on the one hand having more samples per user should provide more information about the underlying distribution, on the other hand, guaranteeing the privacy of all $$m$$ samples should make the estimation task more difficult. We obtain tight bounds for this problem under almost all parameter regimes. Perhaps surprisingly, we show that in suitable parameter regimes, having $$m$$ samples per user is equivalent to having $$m$$ times more users, each with only one sample. Our results demonstrate interesting phase transitions for $$m$$ and the privacy parameter $$\varepsilon$$ in the estimation risk. Finally, connecting with recent results on shuffled DP, we show that combined with random shuffling, our algorithm leads to optimal error guarantees (up to logarithmic factors) under the central model of user-level DP in certain parameter regimes. We provide several simulations to verify our theoretical findings. 
    more » « less
  9. Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem (Ed.)
    We study the sample complexity of causal structure learning on a two-variable system with observational and experimental data. Specifically, for two variables X and Y, we consider the classical scenario where either X causes Y , Y causes X, or there is an unmeasured confounder between X and Y. We show that if X and Y are over a finite domain of size k and are significantly correlated, the minimum number of interventional samples needed is sublinear in k. We give a tight characterization of the tradeoff between observational and interventional data when the number of observational samples is sufficiently large. We build upon techniques for closeness testing and for non-parametric density estimation in different regimes of observational data. Our hardness results are based on carefully constructing causal models whose marginal and interventional distributions form hard instances of canonical results on property testing. 
    more » « less
  10. Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem (Ed.)
    Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets of datapoints. We establish finite-sample statistical bounds, as well as computational complexity bounds, for influence functions and approximate maximum influence perturbations using efficient inverse-Hessian-vector product implementations. We illustrate our results with generalized linear models and large attention based models on synthetic and real data. 
    more » « less